To Be or Not to Be a Zero Pronoun: a Machine Learning Approach for Romanian

نویسندگان

  • Claudiu MIHĂILĂ
  • Diana INKPEN
چکیده

This paper presents a new study on the distribution and identification of zero pronouns in Romanian. A Romanian corpus that includes legal, encyclopaedic, literary, and news texts has been created and manually annotated for zero pronouns. Using a morphological parser for Romanian and machine learning methods, experiments have been performed on the created corpus for the identification of verbs which have a zero pronoun in the subject position. The evaluation results highlight that zero pronouns appear frequently in Romanian, and their distribution depends largely on the genre. Additionally, a search scope for the antecedent has been determined, increasing the chances of correct resolution. Furthermore, more than 70% of the zero pronouns have been accurately identified by various machine learning algorithms. The strong similarity between our results and those obtained for other Romance languages support our conclusions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Debt Collection Industry: Machine Learning Approach

Businesses are increasingly interested in how big data, artificial intelligence, machine learning, and predictive analytics can be used to increase revenue, lower costs, and improve their business processes. In this paper, we describe how we have developed a data-driven machine learning method to optimize the collection process for a debt collection agency. Precisely speaking, we create a frame...

متن کامل

A Deep Neural Network for Chinese Zero Pronoun Resolution

This paper investigates the problem of Chinese zero pronoun resolution. Most existing approaches are based on machine learning algorithms, using hand-crafted features, which is labor-intensive. Moreover, semantic information that is essential in the resolution of noun phrases has not been addressed enough by previous approaches on zero pronoun resolution. This is because that zero pronouns have...

متن کامل

Resolving Romanian Zero Pronouns: A Machine Learning Approach

This paper presents a new study on the distribution, identification, and resolution of zero pronouns in Romanian. A Romanian corpus, including legal, encyclopaedic, literary, and news texts has been created and manually annotated for zero pronouns. Using a morphological parser for Romanian and machine learning methods, experiments were performed on the created corpus for the identification and ...

متن کامل

Identification and Resolution of Chinese Zero Pronouns: A Machine Learning Approach

In this paper, we present a machine learning approach to the identification and resolution of Chinese anaphoric zero pronouns. We perform both identification and resolution automatically, with two sets of easily computable features. Experimental results show that our proposed learning approach achieves anaphoric zero pronoun resolution accuracy comparable to a previous state-ofthe-art, heuristi...

متن کامل

An Effective Approach for Robust Metric Learning in the Presence of Label Noise

Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For example, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures such as ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010